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The generalization properties of an attractive network of 
non monotonic neurons which infers concepts from samples 
are studied. The macroscopic dynamics for the overlap be- 
tween the state of the neurons with the concepts, well as the 
activity of the neurons, are obtained and searched for through 
its numerical behavior. Complex behavior leading from fixed 
points to chaos through a cascade of bifurcation are found, 
when we increase the correlation between samples or decrease 
the activity of the samples and the load of concepts, or tune 
the threshold of fatigue of the neurons. Both the information 
dimension and the Liapunov exponent are given, and a phase 
diagram is built. 

PACS numbers: 87.10, 64.60c 



I. INTRODUCTION 

There are two sources for building more sophisticated 
models of brain behavior as associative memory other 
than the original Hopfield model for neural networks. 
One is the closeness to realistic facts observed in neu- 
ral systems, another one is the trial to attain more com- 
plex learning abilities. Among the successful attempts for 
the former are the multi-state neuron models [Q, which 
include three-state, analog and non-monotonic neurons. 
The capability of generalization, the inference of rules 
from examples, is a instance of the latter j|, ||. The 
categorization, or capability to retrieve patterns of ac- 
tivity in different levels of an hierarchical classification 
is another instance Here, we work out a connection 
between the multi-state neural networks and the catego- 
rization networks, which leads to a new kind of general- 
ization, as a property of such neural devices to infer a 
full concept from small samples of that concept. While 
in most the neural models of learning (see ref. || and 
references therein), the generalization function measures 
the ability of the network to give right answers to each 
question, after being trained with samples of question- 
answer pairs, in the present model the samples are pat- 
terns which carry information about the concepts, which 
can be identified with the answers. 

The multi-state neuron model was introduced to ac- 
count for some degrees of ignorance of pieces of the full 
pattern. It differs qualitatively from the two-state model 
because, in absence of part of the information, fewer bits 



are required to represent the small pattern, so called, as 
one picked up information from the active sites, keeping 
the inactive sites off. Several models of multi-state neu- 
rons were studied with the Hebbian learning algorithm. 
The behavior of the analogue neural network was stud- 
ied first in the case of binary memorized patterns ||, 
and yields a phase-diagram similar to that of stochastic 
binary neurons, replacing the temperature T for the in- 
verse of the gain parameter (the slope at origin of the 
transfer function) . The three-state neural network in the 
presence of three-state uncorrelated patterns was studied 
within the extremely diluted synapse scheme, showing an 
enhancement of the storage capacity with an adequate 
control of its firing threshold. This is more notable when 
the pattern activity (the rate of non vanishing states per 
sites of the pattern) is small |j. Non-monotonic neu- 
ral networks, which take account of the fatigue of each 
neuron after being exposed to an large post-synaptic po- 
tential, was studied by means of a signal-noise analysis 
This network exhibits an interesting super- retrieval 
phase, with vanishing error even for extensive number of 
learned patterns. If it is allowable to the neurons decide 
exchange its states by the opposite of the signal of its lo- 
cal field, the capacity of the network becomes even larger 
than that of three-state neurons || . 

For all these cases a parallel deterministic dynamics 
was assumed given by the set of equations 



(Tjf+i = Fg(hit), i = l, 



,N 



(1) 



where an is the neuron state of site i at time t, is 
the threshold parameter which represents deviation of 
the signal function, and as usual, only odd bounded I/O 
(Input/Output) Fg functions are considered. The local 
field of site i at time t is 

N 

hu = ^2 J io a 3t, (2) 

»'(*?) 

Jij being the elements of the synaptic matrix. In the 
case of three states neurons and patterns, the existence 
of a threshold for which the retrieval is optimized was 
also found by statistical mechanical techniques within the 
replica symmetric approximation [ fTo] , but it can not be 
useful for the non-monotonic network, since it does not 



have an energy function 1 1 1 . 

The task of generalization by a neural network can be 
realized in a manifold of contexts. One kind is the catego- 
rization, which takes place if we use an alternative Heb- 
bian learning algorithm which stores s examples having 
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correlation b with one hierarchical ancestor, for each of 
the p concepts. For the connected model, in the context 
of an attractor neural network, the following modified 
Hebbian learning algorithm has been studied H : 

J ^EE« P - ( 3 ) 

The correlation of the learning example n pp with one 
concept of the set is < £J >= bS^Sij. The phase 
transition from an disordered to a generalization phase, 
where the neurons retrieve one concept, was found to be 
discontinuous with b for a fully connected network Jl2| , 
or smooth for a diluted network p3[ . After sufficiently 
increasing s or 6, and decreasing a = p/N, the error in 
the generalization became small enough to consider such 
task successfully. 

Another interesting kind of generalization is inference. 
The coherence between the learned patterns with activ- 
ity a « 1 allows many patterns being simultaneously 
retrieved jL4l. Then, by learning small patterns, we can 
infer the existence of a whole pattern, with activity a ~ 1. 
Enlarging the effective size of the pattern, we can ex- 
tract much more information than the original patterns 
contain. For instance, we would see wood where before 
we had only seen trees. To obtain such an inferential 
property, however, a more sophisticated algorithm is re- 
quired. Fortunately, it comes from a modified version 
of the Hebbian algorithm in Eq.(||). Nevertheless, it re- 
quires a mathematically difficult effort to make a con- 
nection between generalization and multi-state neurons. 
The unique investigation treating the generalization with 
analog neurons Jl5| uses binary examples. Then it is 
worth analyzing such models in their simpler, extremely 
diluted version, which yields an exactly soluble dynam- 
ics and is biologically relevant at the same time llfj]. In 
this version, a network of three-states monotonic neurons 
shows a clear improvement of the performance as a gen- 
eralization device, if small activity examples are learned 
@. 

We describe the model of a network of non-monotonic 
neurons in the next section. After obtaining the recursion 
relations for the inferential properties in the section (III), 
in the last section we present our conclusions, drawing 
the curves of generalization with special attention at the 
non-steady solutions. 

II. THE MODEL 

We adopt the dynamics given in Eqs.([l|^|), and start 
by defining an I/O function. Although most works em- 
ploy stair-like (modelling by the q-Ising network) or other 
monotonic functions F$, we will to avoid this restriction 
and choose instead 



Fg(x) = sgn(x), \x\ < 9, 

0, \x\ > 9. (4) 

Thus, the I/O function tell us the way in which the net- 
work updates each neuron, which become fatigued out- 
side of the interval \hu\ < 9, according to Eq. ([!]). 

For the synaptic interactions we will assume the Heb- 
bian algorithm in Eq. ([|) , but the examples to be learned 
will be three-states variables, like the neuron state itself. 
In order to preserve the odd symmetry of the neurons, 
those patterns are uniformly distributed around the zero 
state. Thus the examples ryf p are independent random 
variables built from the concepts £f through the follow- 
ing stochastic process: 

€ P = W , < K" >^ b, < (Af) 2 >^ a, (5) 

where =~t 1 with equal probability. The new random 
variables introduced here, A pp , are characterized by their 
mean b and their square mean a, for all examples n 11 ?. 

Then the parameter a is the activity of the exam- 
ples them self, while b is the correlation between exam- 
ples and their respective concept. On the one hand we 
can recover the pure generalization model Q by setting 
A pp =1 1 (a = 1) with a bias b for the positive value, and 
threshold 9 — > oo. In this simple limit the neurons are 
thought of as being submitted to background noise, per- 
haps due to some dirtiness on the pattern. On the other 
hand, the pure multi-state model can also be obtained 
by taking the number of examples s = 1 in Eq.(j^) and 
correlation b = 1. A low activity a«l indicates that in 
many sites the patterns are not active, |f?f p | ^ 1, with 
the effective size of the learned patterns being N e = aN. 
So, when the activity a is not close to 1, we can speak of 
a small pattern jjj. In our model the new viewpoint is 
the following: the small examples are samples of the full 
activity concepts to be inferred. 

The task of generalization (inference) is successful if 
the distance between the state of the neuron and the con- 
cept £ M , defined as £f e J2i l£f — a u\ becomes small 
after some time t. This is the so-called Hamming dis- 
tance, which in this context is called the generalization 
error. In order to measure the quality of the retrieval of 
the small patterns , one needs to consider a Euclidean 
quadratic distance instead of the Hamming distance, but 
we are interested exclusively in the capacity of the net- 
work to infer a larger concept of full activity from the 
samples, in which case E^ suffices. 

Remark: Since E^ is \x dependent it looks like a train- 
ing error with respect to just one pattern j^]. However, 
it is not dependent on the examples n^ p , being indeed 
a generalization error, which is p-degenerate in the con- 
cepts, and it can be chosen a particular state a near to 

e. 

The relevant order parameters for the dynamics, dur- 
ing some specified time t, when the state of network is 
given by {<7it}, are the retrieval overlaps 
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of the a t h-example of the /i^-concept. They are normal- 
ized parameters within the interval [—1, 1], which attain 
the extreme value = 1 whenever n^ p = <jj , by virtue 
of Eq.(j^). Using this definition, with the synaptic inter- 
action in Eq.(||), the local field in Eq.(||) becomes 



It accounts for the active neurons, and plays a similar 
role as the spin glass parameter of the thermodynamic 
equilibrium approach for binary neurons, since it allows 
one to measure the degree of order even when there is no 
retrieval at all ]l8| ], frjll . In both the last two equations, 
we have used the LLN for a sum of IRV, with vanishingly 
fluctuations, in the thermodynamic limit. 



(7) 



Next we need to analyze the evolution of the p.s coupled 
equations ^ instead of the N original Eqs.(|]). 

Because we are interested in the generalizing prop- 
erty of our network, we take an initial configuration 
whose retrieval overlaps are only macroscopic of order 
0(1) for the s examples of a given concept, let say the 
first one, and symmetric (equal for all p). We write 
m]^ t=1 = J2 P m Nt=i f° r t ne symmetric overlap. In 
the thermodynamic limit, the retrieval overlaps m]^ t=1 
in Eq.(^|) are infinite sums of independent random vari- 
ables (IRV), whose fluctuations around its mean value 
<< m 1 1 ^ t=1 >> can be neglected. Then the Law of Large 
Numbers (LLN) applies to get 

m t= i = lim mjvt=i =« x s Fe(A t =o) > Xb >u> , (8) 

N— »oo 

which is i-site independent. Here we have defined the 
new variable of field A t= o = C 1 ' ^t=o = m t =osa 2 x s + 
wo j where mt=o is the initial symmetric retrieval overlap, 
x s = J2p ^ lp j an d w o is the noise produced by the p— 1 
residual concepts in Eq.(||). The averages in the brackets 
are over both x s and loq terms in the field. We have used 
the odd-property of Fg, and wrote the argument in Fg 
here as a sum of two different kind of terms. The first 
one favors the ordering in direction of the first concept, 
while the second ivq introduces an additional noise to 
the original mistakes represented for those sites where 
A, 
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The most interesting feature for us is the generalizing 
property of our network. It is characterized by the over- 
lap of the neural state with the first concept, given in the 
first time step by 



M t= i = lim — yV^o-; i= i =<< Ffl(A t =o) >x B >u 



(9) 



which is related to the generalization error (the Hamming 
distance) by E} =1 = 1 —M t= i- For multi-state neurons it 
is useful to define the dynamical activity order parame- 
ter, given in the first time step by 

-^> it= i) 2 =« [F e (A t=0 )] 2 > Xs >. a ■ 



}t=\ = lim / , 



III. DILUTED DYNAMICS 

Although it is easy to solve the single time Eq.(||) and 
to obtain the generalization error E t , the recursion rela- 
tions for any time t are not easily solved. We then use 
the extremely diluted synapse approximation, for which 
the first time step gives exact results for any number of 
time steps. In this limiting situation the synaptic in- 
teractions take a vanishing value for almost all pairs of 
neurons {ij}, and are of the form given in Eq.ra) only for 
a small fraction C/N < 1 of them. The Eqs^jTrj) are 
then reproducible for any t, with the following simple dis- 
tribution (=) of the noise caused by the examples of the 
p — 1 residual concepts: u>t = z p \/aQ t r, where a = p/C, 
r = s[a 2 + (s — l)b 4 ] and z p = N(0, 1) is a Gaussian ran- 
dom variable with mean < z p >= and unit variance. 
Qt is the dynamical activity at time t. 

We will also use an approximation for the case of many 

examples (s > 10): x s = | + z s ^J with z s = N(0, 1) 
independent of z p . With these remarks, after some al- 
gebra with both Gaussian z s and z p we can write the 
arbitrary time step dynamics for the macroscopic param- 
eters, with the I/O function given by Eq.([|). 
The dynamical activity is the following: 

Q t+1 = \[erf{A + ) - erf(A^)}, A + ee (11) 
2 x/vi 



with vt = set 2 (a — b 2 )(m t ) 2 + arQ t , and the symmetric 
retrieval overlap is 



mt+i = -M t+ i + m t (a - b 2 )Q 



t+i 



(12) 



where we have defined erf(x) 
exp(~y 2 /2)/V^- Here, 



f*dytp(y), ip{y) 



M, 



erf { saMH _ h erf{A ) + erf{A _)} (13) 



t+i 

is the overlap of generalization, and 
1 



C t+1 =<F'g(k t )> z 



sabm t 
v t Jvt 



(14) 



(10) 



We will make no restrictions about the values which the 
parameters b and a can assume within the (0, 1) inter- 
val, except that they must satisfy a > b 2 (the equality 
corresponding to constant microscopic activities A = b). 
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IV. ATTRACTORS AND CONCLUSIONS 

Two fixed-point ordered phases can appear: namely, 
the Generalization phase {G : M > 0, Q > 0} and the 
Self — sustained activity {S : M — 0, Q > 0} (or micro- 
scopic chaotic |2(J) phase. However, the most interesting 
attractors are the non-steady macroscopic phases. Al- 
though the Eqs.([ll]-|l4|) are deterministic, averaged over 
the stochasticity induced by the extensive load p = aC, 
some complex behavior remains present in the large time 
dynamics. It appears a Doubling of period generalization 
phase {D : M t > 0, Qt > 0}, without fixed-point, where 
cyclic or chaotic attractors arise. It can be viewed in the 
curves of Generalization showed in the figures below. 

In the Fig. 1 (below) we see the generalization error E t 
dependence on the sample correlation 6, and activity a, 
in which we took a = b. Fixed values of the number 
of examples, load rate and threshold of fatigue are used 
When b is increased until b\ ~ 0.19, the generalization 
error has a fixed-point behavior. It initially falls until a 
optimal value E t ~ 0.07 at b op ~ 0.15. Then it reaches a 
first bifurcation, beyond which it oscillates between two 
values, exhibiting a periodic behavior. A cycle-4 is found 
after a second bifurcation at 62 ~ 0.31, and this doubling 
of period follows until a quasi-periodic behavior takes 
place at boo ~ 0.35. Between b^ < b < 65, regions of 
chaos intercalate with windows of periodicity. After b$ ~ 
0.6, although the correlation is large, the activity is large 
too, and it destroys the capacity of generalization, so 
that E t — 1. The same behavior was qualitatively found 
as a function of activity a (6), keeping fixed b (a). For 
sufficiently low (high) activity (correlation), E t oscillates 
aperiodicaly, eventually closer to each chosen initial value 
but never equal to it. 

In order to measure the degree of the non-regular be- 
havior we calculated the Liapunov exponent in the re- 
gion of a = b above. It was estimated as [5lJ Al ~ 
^rn[£mr/<5mo], for T ^ 1, where 8m t is the distance 
between two trajectories initially near to each other. It 
gives positive values within the interval b^ < b < b$, 
attaining the value Al ~ 0.34 at be ~ 0.41 as we can 
see in the Fig. 1 (above). It indicates how chaotic is the 
oscillation of Et in this attractor, which shows sensi- 
tivity to initial conditions. We also calculated the in- 
formation dimension of the attractor, estimated by ]2~]]j 
dn ~ ln(N r )/\ ln(r)|, r -C 1, where N r is the number of 
balls with radius r necessary to cover all points E t . For 
the point be we got d H = 0.81. The non integer value of 
dfj shows that such attractor is a fractal. 

The behavior as a function of 9 is drawn in the 
Fig.2(below), where the effect of the fatigue is singled. 
When the threshold is small enough the generalization 
is bad because the local fields almost everywhere exceed 
9, which lead the neurons to its fatigue phase. After 
Q\- ~ 1.3 the probability of the local field being lower 



than 9 becomes relevant, then a periodic regime start. 
A chaotic regime happens between 3.8 < 9 < 6 when 
the local fields fluctuate around 9. An atypical exit from 
the chaotic regime occurs when the 9 is so big that the 
local fields gradually leave the non-sigmoidal phase until 
at 9\ + ~ 15 a new fixed point regime sets in, but now 
with a good generalization. 

A bifurcation diagram was also found as a function 
of the load rate of concepts a. The noise induced by 
the saturation of concepts rose a large fluctuation for the 
local fields. Thus the chaotic behavior, which implies a 
very sensitive flow of the neural states with their previous 
states, is lost for large a. A phase diagram of the model 
is shown in the Fig.2(center), for fixed values of a, 6, s. 
For small values of a, a transition from a S phase to a D 
phase occurs, whenever the threshold of fatigue crosses 
the solid curve. For larger values of a, the solid curve 
separates the S phase from a G phase. The G phase is 
separated from the D phase by the dashed curve. Dif- 
ferently from the phase diagram obtained in [p3[ , here 
no phase {Z : M = 0, Q = 0} can be reached, as can 
be seen from the Eq.(ll), with m t = 0, which reads 



Q 



1 + 1 



erf( 



--). For Q t —> we get Q t +i 



The S phase competes in one region with the G phase, 
but the last is more stable overall in this region. 

In order to compare with the monotonic case, for which 
the I/O function Fg(x) = sgn(x), \x\ > 9 (= 0, |a:| < 9) 
can be taken, we built the phase diagram ct(9) of the 
Fig. 2 (above). The parameter 9 here represents a thresh- 
old of fire of the neurons. There are no D phase for this 
case, but instead, a Z phase can appear for large enough 
values of 9. 

It is not too surprisingly that the motion of the neuron 
states themselves can be over a chaotic trajectory, where 
the memory of the initial configuration is not preserved. 
But in this case the macroscopic parameter measuring 
the retrieval of one pattern is Mt = almost always, 
because the motion is ergodic over the trajectory, run- 
ning equally over all possible state, the huge majority of 
which have vanishingly overlap with that pattern. This 
is the case of the S phase. In the present model, how- 
ever, the chaos appear on the less complex macroscopic 
trajectories for the overlap in so manner that almost al- 
ways M t > 0. Then we can conjecture that in the non- 
steady regimes, the network preserve a memory of what 
concept was used as a seed on the initial configuration. 
Thus it can not be related to the properties of sequential 
generalization , for which a set of concepts can be re- 
trieved consecutively. Because the vector of overlaps M t 
can be roughly orthogonal to its previous state, many 
other directions become macroscopic in each time. 
Only one concept, however, is persistently retrieved, at 
varying magnitude. 

A similar result was recently found for the pure multi- 
state model for retrieval of patterns, but using analog 
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non-monotonic neurons instead of our discrete neurons 
p3| . This shows that the present complex behavior is 
rather a consequence of the non-monotonicity than a 
characteristic of the generalization model. 

The diagrams in Figs. 1-2 demonstrate how a network 
of non-monotonic neurons can exhibit a complex behav- 
ior. The coherent retrieval of samples leads to the ability 
to infer a large activity concept, even for a large load 
ratio. The periodicity of the generalization can be con- 
trolled by the activity of the samples, their correlation 
with each other, and the gain parameter of the neurons. 
We hope it is worth verifying such behavior of the in- 
ferential properties with other learning algorithms and 
higher levels of hierarchy. 
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FIG. 1. Below: Generalization error Et as a function of 
the pattern correlation and activity a — b, number of exam- 
ples s = 20, load rate a = 0.01, and threshold of fatigue 9 = 1. 
Above: The Liapunov exponent for the attractor of the figure 
below. 
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FIG. 2. Below: Generalization error E t as a function of 
8 with b = 0.5 = a, s = 50, a = 0.05. Center: The phase 
diagram a(#), with 6 = 0.5 = a and s = 50. The dashed 
curve separate the D phase from the G phase, while the solid 
curve separate the S phase from or the G or the D phases. 
Above: The phase diagram a(0), with the same parameters 
of the figure at the center, but with monotonic of three-state 
neurons. 
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